120 research outputs found
Solving package dependencies: from EDOS to Mancoosi
Mancoosi (Managing the Complexity of the Open Source Infrastructure) is an
ongoing research project funded by the European Union for addressing some of
the challenges related to the "upgrade problem" of interdependent software
components of which Debian packages are prototypical examples. Mancoosi is the
natural continuation of the EDOS project which has already contributed tools
for distribution-wide quality assurance in Debian and other GNU/Linux
distributions. The consortium behind the project consists of several European
public and private research institutions as well as some commercial GNU/Linux
distributions from Europe and South America. Debian is represented by a small
group of Debian Developers who are working in the ranks of the involved
universities to drive and integrate back achievements into Debian. This paper
presents relevant results from EDOS in dependency management and gives an
overview of the Mancoosi project and its objectives, with a particular focus on
the prospective benefits for Debian
Expressing advanced user preferences in component installation
State of the art component-based software collections - such as FOSS
distributions - are made of up to dozens of thousands components, with complex
inter-dependencies and conflicts. Given a particular installation of such a
system, each request to alter the set of installed components has potentially
(too) many satisfying answers. We present an architecture that allows to
express advanced user preferences about package selection in FOSS
distributions. The architecture is composed by a distribution-independent
format for describing available and installed packages called CUDF (Common
Upgradeability Description Format), and a foundational language called MooML to
specify optimization criteria. We present the syntax and semantics of CUDF and
MooML, and discuss the partial evaluation mechanism of MooML which allows to
gain efficiency in package dependency solvers
Description of the CUDF Format
This document contains several related specifications, together they describe
the document formats related to the solver competition which will be organized
by Mancoosi. In particular, this document describes: - DUDF (Distribution
Upgradeability Description Format), the document format to be used to submit
upgrade problem instances from user machines to a (distribution-specific)
database of upgrade problems; - CUDF (Common Upgradeability Description
Format), the document format used to encode upgrade problems, abstracting over
distribution-specific details. Solvers taking part in the competition will be
fed with input in CUDF format
Efficient Prior Publication Identification for Open Source Code
Free/Open Source Software (FOSS) enables large-scale reuse of preexisting
software components. The main drawback is increased complexity in software
supply chain management. A common approach to tame such complexity is automated
open source compliance, which consists in automating the verication of
adherence to various open source management best practices about license
obligation fulllment, vulnerability tracking, software composition analysis,
and nearby concerns.We consider the problem of auditing a source code base to
determine which of its parts have been published before, which is an important
building block of automated open source compliance toolchains. Indeed, if
source code allegedly developed in house is recognized as having been
previously published elsewhere, alerts should be raised to investigate where it
comes from and whether this entails that additional obligations shall be
fullled before product shipment.We propose an ecient approach for prior
publication identication that relies on a knowledge base of known source code
artifacts linked together in a global Merkle direct acyclic graph and a
dedicated discovery protocol. We introduce swh-scanner, a source code scanner
that realizes the proposed approach in practice using as knowledge base
Software Heritage, the largest public archive of source code artifacts. We
validate experimentally the proposed approach, showing its eciency in both
abstract (number of queries) and concrete terms (wall-clock time), performing
benchmarks on 16 845 real-world public code bases of various sizes, from small
to very large
Towards maintainer script modernization in FOSS distributions
Free and Open Source Software (FOSS) distributions are complex software
systems, made of thousands packages that evolve rapidly, independently, and
without centralized coordination. During packages upgrades, corner case
failures can be encountered and are hard to deal with, especially when they are
due to misbehaving maintainer scripts: executable code snippets used to
finalize package configuration. In this paper we report a software
modernization experience, the process of representing existing legacy systems
in terms of models, applied to FOSS distributions. We present a process to
define meta-models that enable dealing with upgrade failures and help rolling
back from them, taking into account maintainer scripts. The process has been
applied to widely used FOSS distributions and we report about such experiences
Where are your Manners? Sharing Best Community Practices in the Web 2.0
The Web 2.0 fosters the creation of communities by offering users a wide
array of social software tools. While the success of these tools is based on
their ability to support different interaction patterns among users by imposing
as few limitations as possible, the communities they support are not free of
rules (just think about the posting rules in a community forum or the editing
rules in a thematic wiki). In this paper we propose a framework for the sharing
of best community practices in the form of a (potentially rule-based)
annotation layer that can be integrated with existing Web 2.0 community tools
(with specific focus on wikis). This solution is characterized by minimal
intrusiveness and plays nicely within the open spirit of the Web 2.0 by
providing users with behavioral hints rather than by enforcing the strict
adherence to a set of rules.Comment: ACM symposium on Applied Computing, Honolulu : \'Etats-Unis
d'Am\'erique (2009
Gender Differences in Public Code Contributions: a 50-year Perspective
International audienceGender imbalance in information technology in general, and Free/Open Source Software specifically, is a well-known problem in the field. Still, little is known yet about the large-scale extent and long-term trends that underpin the phenomenon. We contribute to fill this gap by conducting a longitudinal study of the population of contributors to publicly available software source code. We analyze 1.6 billion commits corresponding to the development history of 120 million projects, contributed by 33 million distinct authors over a period of 50 years. We classify author names by gender and study their evolution over time.We show that, while the amount of commits by female authors remains low overall, there is evidence of a stable long-term increase in their proportion over all contributions, providing hope of a more gender-balanced future for collaborative software development
Determining the Intrinsic Structure of Public Software Development History
Background. Collaborative software development has produced a wealth of
version control system (VCS) data that can now be analyzed in full. Little is
known about the intrinsic structure of the entire corpus of publicly available
VCS as an interconnected graph. Understanding its structure is needed to
determine the best approach to analyze it in full and to avoid methodological
pitfalls when doing so. Objective. We intend to determine the most salient
network topol-ogy properties of public software development history as captured
by VCS. We will explore: degree distributions, determining whether they are
scale-free or not; distribution of connect component sizes; distribution of
shortest path lengths.Method. We will use Software Heritage-which is the
largest corpus of public VCS data-compress it using webgraph compression
techniques, and analyze it in-memory using classic graph algorithms. Analyses
will be performed both on the full graph and on relevant subgraphs.
Limitations. The study is exploratory in nature; as such no hypotheses on the
findings is stated at this time. Chosen graph algorithms are expected to scale
to the corpus size, but it will need to be confirmed experimentally. External
validity will depend on how representative Software Heritage is of the software
commons.Comment: MSR 2020 - 17th International Conference on Mining Software
Repositories, Oct 2020, Seoul, South Kore
Content-Based Textual File Type Detection at Scale
Programming language detection is a common need in the analysis of large
source code bases. It is supported by a number of existing tools that rely on
several features, and most notably file extensions, to determine file types. We
consider the problem of accurately detecting the type of files commonly found
in software code bases, based solely on textual file content. Doing so is
helpful to classify source code that lack file extensions (e.g., code snippets
posted on the Web or executable scripts), to avoid misclassifying source code
that has been recorded with wrong or uncommon file extensions, and also shed
some light on the intrinsic recognizability of source code files. We propose a
simple model that (a) use a language-agnostic word tokenizer for textual files,
(b) group tokens in 1-/2-grams, (c) build feature vectors based on N-gram
frequencies, and (d) use a simple fully connected neural network as classifier.
As training set we use textual files extracted from GitHub repositories with at
least 1000 stars, using existing file extensions as ground truth. Despite its
simplicity the proposed model reaches 85% in our experiments for a relatively
high number of recognized classes (more than 130 file types)
- …